BUG:fix rolling skew kurt floating issue #18065

zartbot · 2017-11-01T16:08:43Z

closes rolling skew and rolling kurt has wrong answer under all eq values #18044
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

TomAugspurger · 2017-11-01T20:47:25Z

Thanks, the failure on Travis is unrelated.

Could you add a note to whatsnew/v0.21.1.txt and a unit test (maybe using the example from #18044)?

pep8speaks · 2017-11-02T03:16:58Z

Hello @ogotaiking! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 07, 2017 at 13:51 Hours UTC

codecov · 2017-11-02T03:17:11Z

Codecov Report

Merging #18065 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18065      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45728       -9     
- Misses       4383     4392       +9

Flag	Coverage Δ
#multiple	`89.04% <ø> (ø)`	⬆️
#single	`40.32% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...58299cb. Read the comment docs.

codecov · 2017-11-02T03:17:44Z

Codecov Report

Merging #18065 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18065      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50124    50124              
==========================================
- Hits        45742    45733       -9     
- Misses       4382     4391       +9

Flag	Coverage Δ
#multiple	`89.05% <ø> (ø)`	⬆️
#single	`40.32% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc69dc6...ee5a33d. Read the comment docs.

jreback · 2017-11-04T23:30:58Z

doc/source/whatsnew/v0.21.1.txt

@@ -57,6 +57,8 @@ Documentation Changes
 Bug Fixes
 ~~~~~~~~~
 - Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
+- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)


producing numerical unstable results rather than NaN

The original behavior for skewness and kurtosis calculation is :
if the {Xi} distribution is a uniform distribution ( all values are equal), Thus the variance of the values( Sum(Xi-Xmean) ^2 is zero. so B should be zero in this case. But some numerically unstable issue cause B not to be zero.
so consider this situation, we assume if {Xi}'s variance is less than 1e-14, we treat it as uniform distribution.
based on previous behavior is return Nan. I don't want to break this behavior.
Some reference :
from mathematica, for uniform distribution skew/kurt return Nan
http://www.wolframalpha.com/input/?i=skewness%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D
http://www.wolframalpha.com/input/?i=kurtosis%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

jreback · 2017-11-04T23:31:17Z

pandas/_libs/window.pyx

@@ -788,7 +788,7 @@ cdef inline double calc_skew(int64_t minp, int64_t nobs, double x, double xx,
        A = x / dnobs
        B = xx / dnobs - A * A
        C = xxx / dnobs - A * A * A - 3 * A * B
-        if B <= 0 or nobs < 3:
+        if B <= 1e-14 or nobs < 3:


can you add a comment here about what we are checking (and below)

jreback · 2017-11-04T23:32:04Z

pls rebase on master as well.

zartbot · 2017-11-06T03:06:33Z

@jreback for the comments for return Nan for numerically unstable issue[ seems lost the comments during rebase]

consider the uniform distribution {Xi}, X1=X2=....Xn-1=Xn, the variance sum(Xi-Xm)^2/n should be Zero, but some floating issue cause the variance is in range(0,1e-14), and the variance is in the denominator, that will cause skew/kurt result with an extremely large number. so we can treat if the variance is less than 1e-14, it's uniform distribution.

The original code only checks if the variance == Zero, and return Nan for uniform distribution, so I don't want to break this behavior and leave the result as Nan.

some reference :
http://www.wolframalpha.com/input/?source=nav&i=skewness%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

http://www.wolframalpha.com/input/?source=nav&i=kurtosis%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

result is Nan(undefined)

jreback · 2017-11-06T13:34:02Z

@ogotaiking not sure of your last, is this a different case?

zartbot · 2017-11-07T03:59:43Z

@jreback same case. and also a reference from original skewness calculation code

pandas/core/nanops.py

def nanskew(values, axis=None, skipna=True):
    ...
    # floating point error
    m2 = _zero_out_fperr(m2) <- here m2 is eq B in rolling_skew
    m3 = _zero_out_fperr(m3)

    with np.errstate(invalid='ignore', divide='ignore'):
        result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)

and the function _zero_out_fperr

_zero_out_fperr(arg):
    if isinstance(arg, np.ndarray):
        with np.errstate(invalid='ignore'):
            return np.where(np.abs(arg) < 1e-14, 0, arg)
    else:
        return arg.dtype.type(0) if np.abs(arg) < 1e-14 else arg

so here we treat the B <1e-14 is eq 0 in calc_skew and calc_kurt is following the original skew/kurt behavior.

jreback · 2017-11-07T13:14:30Z

@ogotaiking ok I c. can you add update the comment in the code above to include a refernce to _zero_fperr (and maybe a reference in zero_fperr to calc_kurt/calc_skew), just so that future readers will know that these are connected. otherwise lgtm. ping when pushed / green.

jreback · 2017-11-07T13:37:08Z

pandas/_libs/window.pyx

+        #         cause B != 0. and cause the result is a very
+        #         large number.
+        #
+        #         in core/nanops.py nanskew/nankurt call the function


can you add the reference to _zero_fperr as well (back to here)

jreback · 2017-11-07T14:18:36Z

lgtm merge on green

jreback · 2017-11-08T03:00:42Z

thanks @ogotaiking

(cherry picked from commit 93c755e)

TomAugspurger added this to the 0.21.1 milestone Nov 1, 2017

TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 1, 2017

jreback requested changes Nov 4, 2017

View reviewed changes

zartbot added 6 commits November 5, 2017 22:31

fix rolling skew kurt floating issue

b32fe26

update whatsnew

ca253dd

add unit test case

51aa486

fix pep8 issue

3522031

update with review comments

74efc42

remove trailing whitespace

9a968dd

update with review comments

3e20523

jreback reviewed Nov 7, 2017

View reviewed changes

update nanops.py to reference

ee5a33d

jreback approved these changes Nov 7, 2017

View reviewed changes

jreback merged commit 93c755e into pandas-dev:master Nov 8, 2017

watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017

BUG:fix rolling skew kurt floating issue (pandas-dev#18065)

7062687

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG:fix rolling skew kurt floating issue (pandas-dev#18065)

7581861

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

BUG:fix rolling skew kurt floating issue (pandas-dev#18065)

0b57dd8

(cherry picked from commit 93c755e)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

BUG:fix rolling skew kurt floating issue (#18065)

c3c0b26

(cherry picked from commit 93c755e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG:fix rolling skew kurt floating issue #18065

BUG:fix rolling skew kurt floating issue #18065

zartbot commented Nov 1, 2017 •

edited

Loading

TomAugspurger commented Nov 1, 2017

pep8speaks commented Nov 2, 2017 •

edited

Loading

codecov bot commented Nov 2, 2017

codecov bot commented Nov 2, 2017 •

edited

Loading

jreback Nov 4, 2017

zartbot Nov 5, 2017

jreback Nov 4, 2017

jreback commented Nov 4, 2017

zartbot commented Nov 6, 2017

jreback commented Nov 6, 2017

zartbot commented Nov 7, 2017

jreback commented Nov 7, 2017

jreback Nov 7, 2017

jreback commented Nov 7, 2017

jreback commented Nov 8, 2017

BUG:fix rolling skew kurt floating issue #18065

BUG:fix rolling skew kurt floating issue #18065

Conversation

zartbot commented Nov 1, 2017 • edited Loading

TomAugspurger commented Nov 1, 2017

pep8speaks commented Nov 2, 2017 • edited Loading

Comment last updated on November 07, 2017 at 13:51 Hours UTC

codecov bot commented Nov 2, 2017

Codecov Report

codecov bot commented Nov 2, 2017 • edited Loading

Codecov Report

jreback Nov 4, 2017

Choose a reason for hiding this comment

zartbot Nov 5, 2017

Choose a reason for hiding this comment

jreback Nov 4, 2017

Choose a reason for hiding this comment

jreback commented Nov 4, 2017

zartbot commented Nov 6, 2017

jreback commented Nov 6, 2017

zartbot commented Nov 7, 2017

jreback commented Nov 7, 2017

jreback Nov 7, 2017

Choose a reason for hiding this comment

jreback commented Nov 7, 2017

jreback commented Nov 8, 2017

zartbot commented Nov 1, 2017 •

edited

Loading

pep8speaks commented Nov 2, 2017 •

edited

Loading

codecov bot commented Nov 2, 2017 •

edited

Loading